대규모 병렬 처리 프로그래밍: 실습 중심 접근법: 순차적 한계를 넘어서기

'무료 런치'의 종말

수십 년 동안 개발자들은 "순차적 한계"라는 시대를 누렸다. 이 시대에는 데너드 스케일링 모든 새로운 칩 세대가 더 빠른 클럭 속도를 보장했다. 하지만 우리는 이제 파워 월에 도달했으며, 성능은 더 이상 주파수의 함수가 아니다. 그것은 병행성의 함수가 되었다. 앞으로 나아가기 위해서는 계산 사고력 을 활용해 추상적인 수치적 방법 과 현대적인 병렬 실행 모델사이의 격차를 메워야 한다.

정밀도-성능 갈등

분자 역학과 같은 도메인 문제 예를 들어) 다중 코어 호스트 에서 CUDA 장치 로 옮기는 것은 문법 변경을 넘어서는 것이다. 그것은 문제 분해 방식의 변화를 의미한다. 병렬화할 때 자주 연산 순서를 바꾸게 된다. 부동소수점 연산은 결합 법칙이 성립하지 않기 때문에, 우리는 다음과 같은 상충 관계를 마주하게 된다: 부동소수점 정밀도 대 정확도병렬 결과는 수학적으로는 타당할지라도, 순차적 계산 결과와 수치적으로는 다를 수 있다.

TERMINALbash — 80x24

> Ready. Click "Run" to execute.

QUESTION 1

What is the primary reason the 'Sequential Ceiling' was reached?

The end of Moore's Law entirely.

Thermal limits and the Power Wall hindering frequency scaling.

Lack of developer interest in C++.

The transition to quantum computing.

QUESTION 2

According to Amdahl's Law, if 5% of a program is strictly sequential, what is the maximum theoretical speedup?

Infinite speedup.

Approximately 20x.

5x.

100x.

QUESTION 3

Why might a parallel Molecular Dynamics simulation yield slightly different results than a sequential one?

The CPU uses 64-bit while the GPU only uses 8-bit.

Floating-point addition is non-associative in parallel execution.

Parallel threads randomly skip calculations.

The CUDA compiler ignores numerical methods.

QUESTION 4

What does 'Problem Decomposition' involve in the context of parallel programming?

Breaking code into functions for readability.

Mapping domain-specific data to parallel execution models like threads or grids.

Deleting unnecessary variables to save memory.

Compiling the code for multiple OS targets.

QUESTION 5

Which of the following describes the 'Computational Thinking' bridge?

A hardware component between the CPU and GPU.

A framework to translate domain knowledge into architecture-aware algorithms.

An automated AI tool that writes CUDA kernels.

The process of upgrading RAM on a host machine.